10. Choosing Hyperparameter Values
Let's say we are trying to choose the
min_samples_leaf
hyperparameter, and want to avoid overfitting. How many training samples would we choose to be the minimum per leaf? In non-financial and non-time series machine learning, setting this hyperparameter is fairly straightforward: you use grid search cross-validation to find the value that maximizes the model’s performance on validation data. When you have time-series data, you typically don’t use cross-validation because usually you just want a single validation dataset that is as close in time as possible to the present. If you have a problem with high signal-to-noise, then you can try a bit of parameter tuning on the single validation set. In finance, though, you have time series data
and
you have low signal-to-noise. Therefore, you have one validation set and if you were to try a bunch of parameter values on this validation set, you would almost surely be overfitting. As such, you need to set the parameter with some judgement and minimal trials. Later, we'll discuss a bit more about how we make this choice in the project.